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STOCHASTIC  OPTIMIZATION  BY  SIMULATION: 
CONVERGENCE  PROOFS  FOR  THE  GI/G/1  QUEUE  IN 

STEADY-STATE 


PIERRE  L’ECUYER*  and  PETER  W.  GLYNN^ 


ABSTRACT 


Approaches  like  hnite  differences  with  common  random  numbers,  infinitesimal  perturbation  anal¬ 
ysis,  and  the  likelihood  ratio  method,  have  drawn  a  great  deal  of  attention  recently,  as  ways  of 
estimating  the  gradient  of  a  performance  measure  with  respect  to  continuous  parameters  in  a 
dynamic  stochastic  system.  In  this  paper,  we  study  the  use  of  such  estimators  in  stochastic  approx¬ 
imation  algorithms,  to  perform  so-called  “single-run  optimizations”  of  steady-state  systems.  Under 
mild  conditions,  for  an  objective  function  that  involves  the  mean  system  time  in  a  GI/G/1  queue, 
we  prove  that  many  variants  of  these  algorithms  converge  to  the  minimizer.  In  most  cases,  how¬ 
ever,  the  simulation  length  must  be  increased  from  iteration  to  iteration;  otherwise  the  algorithm 
may  converge  to  the  wrong  value.  One  exception  is  a  particular  implementation  of  infinitesimal 
perturbation  analysis,  for  which  the  single-run  optimization  converges  to  the  optimum  even  with 
a  fixed  (and  small)  number  of  ends  of  service  per  iteration.  As  a  by-product  of  our  convergence 
proofs,  we  obtain  some  properties  of  the  derivative  estimators  that  could  be  of  independent  interest. 
Our  analysis  exploits  the  regenerative  structure  of  the  system,  but  our  derivative  estimation  and 
optimization  algorithms  do  hot  always  take  advantage  of  that  regenerative  stucture.  In  a  com¬ 
panion  paper,  we  report  numerical  experiments  with  an  M/M/1  queue,  which  illustrate  the  basic 
convergence  properties  and  possible  pitfalls  of  the  various  techniques. 
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1.  Introduction 


Simulation  has  traditionally  been  used  to  evaluate  the  performance  of  complex  systems,  especially 
when  analytic  formulae  are  not  available.  Using  it  to  perform  optimization  is  much  more  challenging. 
Consider  a  (stochastic)  simulation  model  parameterized  by  a  vector  6  of  continuous  parameters,  and 
suppose  one  seeks  to  minimize  the  expected  value  o(tf)  of  some  objective  function.  In  principle, 
if  a{ff)  is  well  behaved,  one  could  estimate  its  derivative  (or  gradient)  by  simulation,  and  use 
adapted  versions  of  classical  nonlinear  programming  algorithms.  Recently,  the  question  of  how 
to  estimate  the  gradient  of  a  performance  measure  (defined  as  a  mathematical  expectation),  with 
respect  to  continuous  parameters,  by  simulation,  has  attracted  a  great  deal  of  attention.  See,  e.g., 
Glasserman  (1991),  Glynn  (1990),  L’Ecuyer  (1990),  Rubinstein  and  Shapiro  (1993),  and  Suri  (1989). 
For  “steady-state”  simulations,  a  “single-run”  iterative  optimization  scheme  based  on  stochastic 
approximation  (SA)  has  been  suggested  (Meketon  1987,  Pflug  1990,  and  Suri  and  Leung  1989).  At 
each  iteration,  this  scheme  uses  an  estimate  of  the  gradient  of  a  to  modify  the  current  parameter 
value.  These  methods  could  enlarge  substantially  the  class  of  stochastic  optimization  problems  that 
can  be  solved  in  practice. 

In  this  paper,  we  investigate  the  combination  of  SA  with  different  derivative  estimation  tech¬ 
niques  (DETs),  in  the  context  of  a  single  queue.  The  general  theory  of  SA  has  been  studied 
extensively  (see  Kushner  and  Clark  1978,  Metivier  and  Priouret  1984,  and  many  references  cited 
there),  but  not  so  much  their  combination  with  various  DETs,  for  discrete-event  systems  in  the 
steady-state  context,  as  we  do  here.  Preliminary  empirical  experiments  have  been  undertaken  by 
Suri  and  Leung  (1989)  for  a  M/M/1  queue.  These  authors  looked  at  two  SA  methods,  which  they 
presented  as  heuristics.  One  was  based  on  infinitesimal  perturbation  analysis  (IPA),  while  the  other 
was  an  adaptation  of  the  Kiefer- Wolfowitz  (KW)  algorithm,  which  uses  finite  differences  (FD)  to 
estimate  the  derivative.  They  did  not  prove  convergence.  We  examine  in  this  paper  many  DETs, 
including  some  based  on  FD,  with  and  without  common  random  numbers,  IPA,  and  variants  of  the 
likelihood  ratio  (LR)  or  score  function  (SF)  method.  These  techniques  can  be  combined  with  SA 
in  different  ways.  For  example,  at  iteration  n  of  SA,  one  can  use  either  a  (deterministic)  truncated 
horizon  t„,  or  a  fixed  number  i„  of  regenerative  cycles.  Also,  t„  can  increase  with  n  or  remain 
constant.  We  prove  a.s.  (almost  sure)  convergence  to  the  optimizer  for  many  SA/DET  variants. 
Within  each  class  of  DET  (FD,  LR,  IPA),  there  are  variants  for  which  we  require  — ►  oo  as 
n  — *•  oo,  and  others  for  which  there  is  no  constrmnt  on  t„  (e.g.,  it  can  be  constant).  For  the  latter, 
the  DETs  are  all  regenerative  estimators,  with  one  exception.  That  exception  is  IPA  with  the  same 
number  of  customers  at  each  SA  iteration,  for  which  we  prove  weak  convergence. 

Chong  and  Ramadge  (1992a,  1993)  also  analyzed  (in  parallel  to  us)  IPA-based  SA  algorithms 
to  optimize  a  single  queue,  and  proved  a.s.  convergence  to  the  optimum,  using  different  proof 
techniques  than  ours  and  different  assumptions.  In  their  first  paper,  they  studied  the  case  of  an 
M/G/1  queue,  while  in  their  second,  they  considered  a  GI/G/1  queue  and  an  SA  algorithm  which 
updates  the  parameter  after  an  arbitrary  number  of  customers.  That  includes  in  particular  the  case 
of  one  customer  per  SA  iteration.  In  Chong  and  Ramadge  (1992b),  they  extended  their  analysis  and 
convergence  proofs  to  more  general  regenerative  systems.  Fu  (1990)  previously  analyzed  a  different 
variant  of  SA-IPA  algorithm,  for  which  he  proved  a.s.  convergence.  His  algorithm  exploited  the 
regenerative  structure  of  the  system  and  the  special  form  of  the  objective  function  (1)  (see  §2.2).  His 
result  corresponds  to  our  Proposition  6  (c).  Wardi  (1988)  also  suggested  and  analyzed  a  different 
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variant  of  SA,  combined  with  IPA,  for  which  he  showed  a  non-standard  kind  of  convergence  which 
he  called  convergence  in  zero  upper  density.  In  ail  those  papers,  only  IPA  was  considered. 

A  different  approach  for  stochastic  optimization,  called  the  stochastic  counterpart  method,  is 
proposed  and  thoroughly  analyzed  in  Rubinstein  and  Shapiro  (1993).  The  basic  idea  is  to  estimate 
the  whole  objective  function  as  a  function  of  tf,  in  a  parameterized  form,  using  a  likelihood  ratio 
technique,  and  then  to  optimize  that  sample  function  by  a  standard  (deterministic)  optimization 
method.  In  this  paper,  we  do  not  consider  that  approach. 

In  §2,  we  consider  a  GI/G/1  queue  for  which  the  decision  variable  is  a  parameter  of  the  service 
time  distribution.  The  aim  is  to  minimize  a  function  of  the  average  system  time  per  customer.  We 
feel  that  many  important  questions  that  would  arise  in  more  general  models,  when  SA  is  used  to 
optimize  infinite- horizon  (steady-state)  simulations,  are  well  illustrated  by  this  simple  example.  We 
recall  the  classical  SA  algorithm  and  give  (in  Appendix  I)  sufficient  conditions  for  its  convergence 
to  the  optimum.  The  §3  reviews  different  ways  of  estimating  the  derivative  (DETs).  For  a  variety 
of  SA-DET  combinations,  we  prove  convergence  to  the  optimum,  under  specific  conditions  (see 
Propositions  3-7).  In  the  conclusion,  we  discuss  briefly  how  all  of  this  can  be  extended  to  more 
general  systems  and  mention  prospects  for  further  research.  A  companion  paper  (L’Ecuyer,  Giroux, 
and  Glynn  1993)  reports  numerical  investigations  and  discusses  the  question  of  convergence  rates, 
for  which  further  analysis  would  be  needed. 

All  our  proofs  are  relegated  to  Appendix  II.  Since  d  changes  constantly  between  the  iterations 
of  SA,  some  convergence  properties  of  the  derivative  estimators  (e.g.,  bounded  variance  and  con¬ 
vergence  in  expectation  to  the  steady-state  derivative)  must  be  shown  to  hold  uniformly  in  B.  As 
a  byproduct  of  our  proofs,  we  obtain  original  results  concerning  GI/G/1  queues  that  could  be  of 
independent  interest.  For  instance,  it  follows  from  the  renewal- reward  theorem  (Wolff  1989)  that 
for  a  stable  queue,  the  average  sojourn  time  of  the  first  t  customers  in  the  queue  converges  in 
expectation,  as  t  — >  oo,  to  the  infinite-horizon  average  sojourn  time  per  customer.  We  prove,  under 
appropriate  conditions,  that  this  convergence  is  uniform  over  $  and  s,  where  s  is  the  initial  state 
(taken  over  some  compact  set),  which  corresponds  to  the  waiting  time  of  the  first  customer,  and  9 
lies  in  a  compact  set  in  which  the  system  is  (uniformly)  stable.  We  also  derive  a  similar  uniform 
convergence  result  for  the  derivative  of  the  expected  average  sojourn  time  with  respect  to  0,  and  a 
few  additional  characterizations  of  this  expectation. 

2.  Example:  a  GI/G/1  Queue 


2.1.  The  Basic  Model 

Consider  a  GI/G/1  queue  (Asmussen  1987,  Wolff  1989)  with  interarrival  and  service-time  distri¬ 
butions  A  and  Bg  respectively,  both  with  finite  expectations  and  variances.  The  latter  depends  on 
a  parameter  S  e  0  =  [^mint  ^max]  C  IR.  We  assume  that  for  each  ^  €  0,  the  system  is  stable.  Let 
w(0)  be  the  average  sojourn  time  in  the  system  per  customer,  in  steady-state,  at  parameter  level 
0.  The  objective  function  is; 

q(<?)  =  u;(fl) C(«).  (1) 

where  C  :  0  >-»  K,  is  continuously  differentiable  and  analytically  available.  We  want  to  minimize 
a{0)  over  0  =  where  B^ia  <  ^min  <  ^max  <  ^max-  Let  0*  be  the  optimum.  We  define 
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0  and  0  this  way  to  be  able  to  do  FD  derivative  estimation  at  any  point  of  0  (see  §3.1).  This  is 
also  useful  for  LR  and  IPA.  Let  0'^  be  an  open  interval  that  contains  0. 

A  GI/G/1  queue  can  be  described  as  a  discrete-time  Markov  chain  as  follows.  For  t  >  1,  let 
VFi,  amd  W*  =  Wi  -I-  G  be  the  waiting  time,  service  time,  and  sojourn  time  for  the  t-th  customer, 
and  {/,  be  the  time  between  arrivals  of  the  t-th  and  (t  -1-  l)-th  customer.  For  our  purposes,  Wi  will 
be  the  state  of  the  Markov  chain  at  step  i.  The  state  space  is  5  =  [0,oo)  and  VFi  =  s  is  the  initial 
state.  VTi  =  0  corresponds  to  an  initially  empty  system.  For  t  >  1,  one  has 

W:  :=W,  +  (:,  and  VF.+,  :=  (IF*  - 1..)+  (2) 

where  i"*"  means  max(z,  0).  Since  C(B)  can  be  evaluated  directly,  we  will  estimate  only  the  derivative 
of  w(B)  and  then  add  C'(ff)  separately.  Here  and  throughout  the  paper,  the  “prime”  denotes  the 
derivative  with  respect  to  6. 

We  can  view  the  Markov  chain  {W,, t  =  1,2,...}  as  being  defined  over  the  probability  space 
(n,E,Pfl,3),  where  {Ps,ai  ®  €  0,s  €  5}  is  a  family  of  probability  measures  defined  over  (fl.E). 

The  sample  point  u  €  fl  represents  the  “randomness”  that  drives  the  system  and  Pa,,  depends 

(in  general)  on  6  and  s  (where  Wj  =  s  €  5  is  deterministic).  Let  Pa,,  denote  the  corresponding 
mathematical  expectation.  When  the  quantities  involved  do  not  depend  on  s,  we  sometimes  denote 
Ee,a  by  Es  to  simplify  the  notation.  For  t  >  1,  let 

ht{9,s,u)  =  (3) 

•=i 

ii;j(tf,s)  =  /  ht{B,s,u)dP9,,{uj).  (4) 

Ja 

Here,  ht(B,s,u)  represents  the  total  sojourn  time  in  the  system  for  the  first  t  customers,  and 
wt{6,s)  its  expectation.  Let  Pi  be  the  <T-field  generated  by  (Ci»»'i>- •  •.Cfi>'f)-  Then,  ht{B,s,ijj)  is 
Pi-meJisurable.  Also,  if  s  =  0  and  if  r  denotes  the  number  of  customers  in  the  first  busy  cycle,  then 
r  -I- 1  is  a  stopping  time  with  respect  to  {Pj,  t  >  1}.  Let  S  =  (0,c]  be  the  set  of  admissible  initial 
states,  where  c  is  a  (perhaps  large)  constant.  It  is  well  known  from  renewal  theory  that  for  each 
fixed  6  e  Q  and  s  €  5,  lim(_oo  v}t{0,s)/t  =  w{d). 


2.2.  Variants  of  the  Optimality  Equation 

If  a  is  convex  and  6*  lies  inside  0,  then  the  minimization  problem  is  equivalent  to  finding  a  root  of 

a’i0)  =  w\9)  +  a{e).  (5) 

Even  if  0*  is  on  the  boundary  of  0,  the  minimization  problem  can  be  solved  by  a  descent  method 
which,  at  each  step,  computes  a\9)  at  the  current  point  0  and  moves  opposite  to  its  sign.  Here, 
we  will  use  a  stochastic  descent  method  (see  §2.3),  which  at  each  iteration  moves  in  the  direction 
of  an  estimate  of  a'(9). 

Alternative  formulae  for  the  direction  of  descent  can  be  derived  using  a  regenerative  approach 
as  follows.  Let  s  =  0  and  let  t  be  the  number  of  customers  in  the  first  regenerative  cycle  (busy 
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period).  iFYom  elementary  renewal  theory  one  has  tv($)  =  u{B)/t{9)  where 

'  r 

u{9)  =  Eifi  ^w: 

m  =  E,,o[r]- 

If  w'{9)  exists,  then,  from  standard  calculus,  one  has 

^  ' - iW) - m — ■  ‘  ’ 

One  can  combine  estimators  for  each  of  the  four  quantities  on  the  right-hand-side  of  (6)  to  obtain 
an  estimator  for  w'(9).  Alternatively,  finding  a  root  of  (5)  is  the  same  as  finding  a  root  of 

u'(9)-w(9)/X9)  +  /(9)C'(9)  (7) 

or  of 

u'(9)£(9)  -  /'(9)u(9)  -h  /^(9)C'(9).  (8) 

So,  instead  of  using  an  estimate  of  (5)  in  the  descent  method,  one  use  an  estimate  of  (7)  or  (8). 
That  was  first  suggested  by  Fu  (1990)  for  (7)  and  Glynn  (1986)  for  (8).  The  interest  of  (7-8)  is  that 
unbiased  estimators  of  them  can  be  obtained  based  on  a  few  regenerative  cycles,  which  is  not  the 
case  for  (5).  For  example,  an  unbiased  estimator  of  (8)  is  easily  built  from  an  unbiased  estimator 
of  (/(9),i'(9),u'(9))  and  an  independent  unbiased  estimator  of  (/(0),u(0)).  Such  estimators  can  be 
constructed  via  the  LR  method,  based  on  two  regenerative  cycles  (see  §3.3).  Similarly,  an  unbiased 
estimator  of  (7)  can  be  constructed  via  IPA,  based  on  one  regenerative  cycle,  often  in  spite  of  the 
fact  that  the  estimators  of  u'(9)  and  £'(9)  are  individually  biased  (see  §3.4  and  Heidelberger  et  al. 
1988). 

Equations  (5)  and  (7-8)  are  specific  to  the  form  of  our  objective  function  (1).  For  a  more 
general  case,  let  a(6)  =  <p(9,w(9)),  where  <p  is  convex  and  continuously  differentiable.  Then,  as  in 
Chong  and  Ramadge  (1992b), 

where  dip  169  and  dtpjdw  denote  the  partial  derivatives  of  p  w.r.t.  its  first  and  second  component, 
respectively.  With  appropriate  conditions  on  p,  our  development  for  the  DETs  which  aim  at  esti¬ 
mating  a\9)  would  go  through  for  this  more  general  case.  Generalization  to  vectors  of  parameters 
is  straightforward.  Of  course,  more  complicated  non-convex  functions,  e.g.,  with  multiple  local 
minima,  are  more  difficult  to  deal  with,  as  in  the  deterministic  case. 

Equations  (7-8)  are  more  dependent  on  the  form  of  a  than  (5).  For  another  illustration,  let 
a(9)  =  D(9)w\9)  -f  C{9),  where  D  and  C  are  known  differentiable  functions.  Here, 

a'(9)  =  D'(9)w\9)  +  2D{9)w(9)w'(9)  +  C’i9). 

But  in  a  descent  algorithm,  one  can  use  instead  an  unbiased  estimator  of 

t^(9)a'{9)  =  D'(9)u'^(9)l{9)  +  2D(9)ui9)[t(9)u\9)  -  u{9)f(9)]  +  ^(«)C'(»), 

which  can  be  obtained  via  LR,  based  on  three  (independent)  regenerative  cycles. 
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2.3.  The  Stochastic  Approximation  Scheme 

We  consider  a  stochastic  approximation  (SA)  algorithm  of  the  form 

^n+l  •—  ^0(^1*  ~  'UnYn)'!  (9) 

for  n  >  1,  where  9n  is  the  parameter  value  at  the  beginning  of  iteration  n  (^i  €  6  is  fixed,  or 
random  with  known  distribution),  Vn  is  an  estimator  of  either  (5),  (7),  or  (8),  obtained  at  iteration 
{7fnr»  >  1}  is  a  (deterministic)  positive  sequence  decreasing  to  0  such  that  7"  = 

xe  denotes  the  projection  on  the  set  @  (i.e.  xe(^)  is  the  point  of  6  closest  to  9).  To  obtain  y„,  in 
each  case,  we  compute  directly  C'(9„),  and  estimate  only  the  remaining  terms,  by  simulating  the 
system  for  one  or  more  “subrun(s)”  of  finite  duration.  Specific  estimators  are  discussed  in  §3. 

Let  Sn  €  5  denote  the  state  of  the  system  at  the  beginning  of  iteration  n.  For  all  the  estimators 
that  we  consider,  the  distribution  of  (yn)^n+i)>  conditional  on  (tfn»«n)i  is  completely  specified  by 
n  and  and  is  independent  of  the  past  iterations.  In  other  words,  {(y„,tfn+i»Sn+i)»n  >  0} 

is  a  (nonhomogeneous)  Markov  chain  (Fb  is  a  dummy  value).  Denote  by  J5n-i(‘)  the  conditional 
expectation  E(-  |  9n<s„),  i.e.,  the  expectation  conditional  on  what  is  known  at  the  beginning  of 
iteration  n.  Suppose  that  each  Yn  is  viewed  as  an  estimator  of  (5)  and  is  integrable.  Then,  En-i(Yn) 
exists  and  we  can  write 

yn=a'(9n)  +  /3n  +  (n  (10) 

where 

/j„  =  £:„_i[y„]  -  a'(fl„)  (11) 

represents  the  (conditional)  bias  on  ¥„  given  (^n,Sn)>  while  e„  is  a  random  sequence,  with 
=  0  and  En-ii^l)  =  var  (^n  1  ^n,Sn).  If  yn  is  an  estimator  of  (7)  or  (8)  instead, 
then  replace  Q:'(fln)  in  (10-11)  by  /(dn)«'(^n)  or  /*(#n)o'(^n)»  respectively. 

2..^.  Convergence  to  the  Optimum 

Proposition  1  below,  proved  in  Appendix  I,  gives  (simplified)  sufficient  conditions  for  the  conver¬ 
gence  of  (9)  to  the  optimum.  It  treats  the  case  where  the  (conditional)  bias  /?„  goes  to  zero  and 
the  variance  of  Yn  does  not  increase  too  fast  with  n.  For  some  of  the  regenerative  methods,  one  has 
/?n  =  0  for  each  n.  Otherwise,  when  the  DET  uses  the  same  number  of  customers  at  all  iterations, 
/S„  typically  does  not  go  to  zero.  But  sometimes,  Eo{/3„)  -*  0  and  the  algorithm  might  still  converge 
to  the  optimum.  This  is  treated  by  Theorem  1  of  Appendix  I,  which  ensures  weak  convergence 
under  appropriate  conditions. 

PROPOSITION  1.  Suppose  that  a  is  differentiable  and  convex  or  strictly  unimodal  over  0.  If 
lim„_oo  /3ti  =  0  a.s.  and  >  linin-oo  ^  o.s..  ■ 

For  convenience  in  the  following  sections,  in  the  case  where  y„  is  a  truncated-horizon  estimator 
of  (5),  we  will  decompose  as  +  »  where  0n  component  due  to  the  fact  that 

we  simulate  over  a  finite  horizon  and  represents  the  possibility  that  Yn  may  itself  be  a  biased 
estimator  of  the  derivative  of  the  finite-horizon  expected  cost.  Typically,  with  FD,  ^  0.  If  we 
use  a  deterministic  horizon  t„  at  iteration  n,  then  (3^  —  ^{„(®n>^n)/*n  ~  To  make  sure  that 

the  latter  converges  to  zero  a.s.,  we  will  show,  under  appropriate  conditions,  that  w\{9,  s)/t  —  w'{9) 
converges  to  zero  uniformly  in  {9,  s)  as  t  goes  to  infinity. 
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2.5.  Continuous  Differentiability  and  Uniform  Convergence 


We  want  sufficient  conditions  under  which  a  is  convex  or  strictly  unimodal,  w  and  each  u;,(  -,s)  are 
differentiable,  and  the  following  uniform  convergence  results  hold: 

lim  sup  s)/t  -  tn(^)|  =  0  (12) 

S£&,tes 

and 

lim  sup  (u;J(tf,s)/t  —  =  0.  (13) 


In  Proposition  11,  we  establish  (12-13)  under  Assumption  A  below.  We  also  prove,  under 
Assumptions  A-C,  that  Wt(9,s)/t  is  convex  and  continuously  differentiable  in  B  for  each  s  and 
t,  and  that  a  is  also  convex  and  continuously  differentiable.  Note  that  these  properties  can  be 
expected  to  hold  only  when  appropriate  regularity  conditions  are  imposed  on  the  service  time 
distribution  8$.  On  the  other  hand,  the  properties  that  are  exploited  here  are  merely  sufficient 
for  the  validity  of  SA,  not  necessary.  Assumptions  A  and  B  are  used  for  IPA  and  LR  derivative 
estimation,  respectively  (they  are  typical  IPA  and  LR  assumptions),  while  C  is  used  to  ensure 
the  convexity  of  a.  For  example,  an  exponential  service  time  distribution  with  mean  0  verifies  all 
these  assumptions;  see  L’Ecuyer,  Giroux,  and  Glynn  (1993)  for  the  details.  Define  Ui  Bt{C). 
Then,  £/,  is  a  f/’(0, 1)  random  variable  and  Ci  =  min{C  |  5«(C)  <  Ui).  Define  also 

2.  =  ^  Vw)- 

ASSUMPTION  A. 

(i)  There  is  a  distribution  B  such  that  sup^^^o  Bg^{u)  <  B~^(u)  for  each  u  in  (0, 1).  The  queue 
remains  stable  when  the  service  times  are  generated  according  to  B.  Also,  ^[C*]  <  where 
1  <  K(  <  oo  and  E  is  the  expectation  that  corresponds  to  B. 

(ii)  For  each  u  €  (0, 1),  Bg^(u)  is  differentiable  in  0.  There  exists  a  measurable  F  :  (0, 1)  i-»  E, 

such  that  sup^ggo  1  <  /ifr  <  oo- 


ASSUMPTION  B. 

(i)  Assumption  A  (i)  holds  and  the  moment  generating  function  associated  with  B  is  finite  in 
some  neighborhood  of  zero. 

(ii)  Let  Be  have  a  density  be,  whose  support  {C  >  0  j  hs(C)  >  0}  is  independent  of  6  for  0  6  G°. 
(Hi)  Everywhere  in  0°,  be{Q  continuously  differentiable  with  respect  to  0,  for  each  C  >  0. 

(iv)  For  each  €  0  and  K  >  1,  there  exists  T  =  {0q  —  cq,  +  fo)  C  0®  and  ^  €  0®  such  that 

81 


and  Ee 


S€T  \hs{Q) 


<  oo. 


ASSUMPTION  C. 

C  is  convex  and  continuously  differentiable  and  for  each  u,  Bf^(u)  is  convex  in  0. 
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3.  Ways  of  Estimating  the  Derivative 


One  crucial  ingredient  for  the  SA  ailgorithm  considered  here  is  an  efficient  derivative  estimation  tech¬ 
nique  (DET).  In  this  section,  we  survey  some  possibilities  and  state  convergence  results  regarding 
their  combination  with  SA.  All  the  propositions  are  proved  in  Appendix  II. 


3.1.  Finite  Differences  (FD) 

This  method  is  described,  for  instance,  in  Glynn  (1989)  and  Kushner  and  Clark  (1978),  without 
the  projection  operator.  When  used  in  conjunction  with  FD,  the  SA  algorithm  (9)  is  known  as  the 
Kiefer- Wolfowitz  (KW)  algorithm.  Here,  we  describe  and  use  central  (or  symmetric)  FD.  For  other 
variants,  like  forward  FD,  see  the  above  references.  When  0  is  a  d-dimensional  vector,  the  latter 
uses  only  d-|- 1  instead  of  2d  subruns  per  iteration.  However,  its  asymptotic  convergence  rate  is  not 
as  good  (Glynn  1989).  Spall  (1992)  analyzes  a  different  FD  method  for  SA,  called  “simultaneous 
perturbation”,  and  shows  that  it  could  be  significantly  more  efficient  than  FD  when  d  is  large. 

Take  a  deterministic  positive  sequence  {c„.  n  >  1}  that  converges  to  0.  At  iteration  n,  simulate 
from  some  initial  state  s~  €  5  at  parameter  value  6~  =  5rg(tfn  -  Cn)  =  max(f>„  -  c„,  ^nun)  for 
customers.  Simulate  also  (independently)  from  state  s+  €  5  at  parameter  value  6*  =  xg(tf„-|-c„)  = 
min(d„  -1-  c„,  dniMt)  for  tn  customers.  Let  ui~  and  lj*  denote  the  respective  sample  points.  A  FD 
estimator  of  (5)  is 

y,  =  (14) 

The  conditional  bias 

[Yn  -  C'„{0)  -  wl{0n,sM 
can  itself  be  decomposed  as  /?^  =  -I-  ,  where 

gD  _  ^t„i0t^Sn)  -  WtJ0~,S„)  _  w'tJ0n,S„) 

(0t-0n)tn  tn 

and 

^tA^,st)-Wt„{0t,3n)  +  Wt„{0-,Sn)-Wt„{0-,S-) 

- {Si-KYn  ‘  ’ 

represent  respectively  the  bias  due  to  finite  differences  and  the  bias  due  to  the  possibly  different 
initial  states. 


PROPOSITION  2.  Let  Assumptions  A-B  hold,  t„  -*  oo  and  c„  — ►  0.  Then  limr,-,To  0n  ^  0- 


The  term  can  be  eliminated  by  picking  s"  ~  s^  =  3„.  Otherwise,  if  ‘s  bounded, 

the  numerator  in  (15)  should  be  in  0(l/t„)  (asymptotically).  In  that  case,  to  get  -*  0,  take 
l/(fnCn)  — >  0-  Even  when  =  0,  taking  tn  constant  may  lead  to  problems,  because  is  usually 
not  zero.  As  c„  -♦  0,  when  u~  and  u*  are  distinct  (“independent”),  the  variance  of  Tn  usually 
increases  to  infinity.  However,  we  have; 
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PROPOSITION  3.  Suppose  one  uses  SA  with  the  estimator  (H).  Let  Assumptions  A-C  hold, 
in  -*  oo,  Cn  0,  and  <  *3.  Assume  that  — ►  0  a.s.  as  n  -*  oo  (this  can  be 

achieved  triviaU'j  by  taking  a"  =  aj  =  Sn).  Then,  — ►  8*  a.s..  | 


Note  that  in  the  proofs  of  Propositions  2  and  3,  Assumption  A  is  used  to  prove  (13),  while  C 
is  used  to  prove  the  convexity  of  u;(  ),  and  B  is  only  used  to  prove  the  continuous  differentiability. 
These  remarks  also  apply  to  Proposition  6. 

A  different  approach  is  to  estimate  (8)  instead  of  (5)  using  finite  differences.  A  forward  FD 
approximation  of  (8)  at  =  i9„,  adapted  from  Glynn  (1986),  is 


“(^" )  “^^"  ■<(<?»)  -  i--«(gn) + f(f>:)/(ff„)c'(f?„) 


8;t-8n 


—  "n 


(16) 

(17) 


To  estimate  (17),  simulate  for  2tn  independent  regenerative  cycles,  using  parameter  value  0„ 
for  the  odd  [even]  numbered  cycles.  Let  denote  the  number  of  customers  during  the  j-th  cycle 
and  hj  denote  the  total  system  time  for  those  Tj  customers.  Then,  an  unbiased  estimator  of  (17)  is 


/  f*2}‘''2j-l  -  b.2j-lT2j 
tn  V  8n  -8n 


+ 


r2jr2j-i 


Here,  -*  oo  is  not  required.  For  instance,  one  can  use  fn  =  1  for  all  n. 


(18) 


PROPOSITION  4.  Let  Assumptions  A-C  hold,  c„  =  n“’  forO  <  q  <  1/2,  and  suppose  one  uses 
SA  with  the  estimator  (18).  Then,  6„  — ►  6*  a.s..  | 


S.2.  Finite  Differences  with  Common  Random  Numbers  (FDC) 

One  way  to  reduce  the  variance  in  (14)  is  to  use  common  random  numbers  across  the  subruns  at 
each  iteration,  start  all  the  subruns  from  the  same  state:  s~  =  s;!*  =  s„,  and  synchronize.  More 
specifically,  one  views  u  as  representing  a  sequence  of  C7(0, 1)  variates,  so  that  all  the  dependency 
on  (®,  s)  appears  in  ht{9,  s,  •).  Take  =  w"  =  u;„.  Since  the  subruns  are  aimed  at  comparing  very 
similar  systems,  /h,(tf^,s„,a;„)  and  ht^{8~ , Sn,u„)  should  be  highly  correlated,  especially  when  Cn 
is  small,  so  that  considerable  variance  reductions  should  be  obtained.  Conditions  that  guarantee 
variance  reductions  are  ipven  in  Glasserman  and  Yao  (1992).  Proposition  2  still  applies.  However, 
taking  t„  -*•  oo  is  essential.  In  L’Ecuyer,  Giroux,  and  Glynn  (1993),  we  discuss  implementation 
issues  related  to  FDC  and  show  that  if  t„  is  kept  constant,  SA  with  FDC  typically  converges  to 
the  wrong  value. 
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S.3.  A  Likelihood  Ratio  (LR)  Approach 


The  LR  approach  (Glynn  1990,  L’Ecuyer  1990,  Reiman  and  Weiss  1989,  and  Rubinstein  1989)  can 
be  used  as  follows  to  estimate  w't{0,s).  Let  us  view  u;  as  representing  the  sequence  of  inter-arrival 
and  service  times  for  the  first  t  customers,  that  is,  u  =  (Cii  i  C«i  ^’t)-  Eor  any  s  €  5,  to 
differentiate  the  expectation  (4)  with  respect  to  0,  take  a  fixed  €  ©  and  rewrite: 


ht(0,s,u)Lt{0o,0,3,uj)dP9o^,(u)) 


(19) 


where 


Lti0Q,0,S,Lj)='[l 

t=l 


b0{C.) 

bSoiCt) 


is  a  likelihood  ratio.  Under  appropriate  regularity  conditions  (see  L’Ecuyer  1993),  one  can  differ¬ 
entiate  Wt  by  differentiating  inside  the  integral: 


w't{0,s)=  f  ij}t{0,s,u)dP0^,(Lj) 
JO 


where 

rl;t(0,s,uj)  =  ht(0,s,u)Lt(0o,0,s,u;)  =  hi(0,s,u>)Lt(0o,0,3,Lj)St(0,3,i^) 


(20) 


is  the  LR  estimator, 


5t(tf,s,w) 


Lt(0o,0,s,u>)  ^ 


(21) 


is  called  the  scorz  function,  and 

d,  =  ^ln6s(C). 

Only  one  simulation  experiment  (using  Ps^,,)  is  required  to  estimate  the  derivative.  ),FVom  Propo¬ 
sition  14,  under  Assumption  B,  (20)  is  an  unbiased  estimator  of  w[{0,  s)  for  0  in  some  neighborhood 
of  0Q.  After  adding  the  derivative  of  the  deterministic  part  and  taking  0o  =  0n,  the  LR  derivative 
estimator  at  iteration  n  of  SA  becomes 


in  —  "b  '0tn(^ni'Sni‘*^n)/^n  —  C”  (^n)  +  ht„(^n»  5n»**’n)‘S't„(^n>  ^n»‘*'n)/^n 


(22) 


Note  that  the  variance  of  St(0,s,u>)  (and  of  (20)  at  =  ^o)  increases  with  t.  This  is  a  significant 
drawback  and  must  be  taken  into  account  when  making  the  tradeoff  between  bias  and  variance. 
Here,  0^  =  0  and  0^  -»  0  as  t„  —  0.  But  the  variance  on  ¥„  then  goes  to  infinity  also.  One 
remedy,  as  in  FD,  is  to  increase  t„  more  slowly.  We  show  in  Proposition  17  that  under  Assumption 
B,  the  variance  of  Y„  does  not  increase  faster  than  linearly  in  t„.  The  conditions  of  Proposition  1 
can  then  be  verified  with  7„  =  7on”*  and  t„  =  to  Un'’  for  0  <  p  <  1,  where  to  and  ts  >  0  are 
two  constants.  In  the  finite-horizon  case,  SA  with  LR  converges  at  a  rate  of  t“*/*  (Glynn  1989)  in 
terms  of  the  total  simulation  length  t.  But  when  the  variance  increases  with  t„  and  t„  increases 
with  n,  this  is  no  longer  true. 

One  can  circumvent  the  bias/ variance  problem  of  LR  by  exploiting  the  regenerative  structure 
(Glynn  1986  and  1990,  Reiman  and  Weiss  1989).  One  approach  is  to  combine  estimators  for  each 
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of  the  four  quantities  on  the  right-hand-side  of  (6)  to  construct  an  estimator  for  w'(0).  One  can 
estimate  l(ff)  and  u(0)  as  usual,  and  u'(ff)  and  i'(ff)  by: 

V’u(0,w)  =  SAffM  (23) 

and 

=  rST(ff,0,u),  (24) 

respectively.  ^From  Proposition  15,  under  Assumption  B,  these  estimators  are  unbiased.  Suppose 
one  simulates  for  r  regenerative  cycles.  Let  ry  be  the  number  of  departures  during  the  j-th  cycle, 
hj  the  total  system  time  for  those  rj  customers  in  cycle  j,  and  Sj  the  score  function  associated 
with  that  cycle.  Then,  an  estimator  of  w'(ff)  is  given  by 


E  E  ~  E  E 


j=i  j=i  >=i  >=i 


(25) 


This  estimator  is  biased  for  finite  r.  However,  we  show  in  Proposition  18  that  under  Assumption 
B,  as  r  — ►  00,  (25)  has  bounded  variance  and  converges  in  expectation  to  w'(ff)^  uniformly  with 
respect  to  0.  The  corresponding  estimator  of  a'(tf„)  for  iteration  n,  based  on  regenerative  cycles, 
is: 


Y„  =  C'i0„)  + 


(26) 


Now,  instead  of  trying  to  estimate  a'(0„)  at  each  iteration,  one  can  estimate  (8).  Since  (23-24) 
provide  unbiased  estimators  of  u'(0)  and  l^{0),  an  unbiased  estimator  of  (8)  can  be  obtained  from 
two  independent  regenerative  cycles  as  described  in  the  text  that  follows  (8).  One  can  also  use 
more  than  two  cycles  and  average  out.  Further,  estimators  of  all  quantities  can  be  computed  from 
each  cycle  and  combined  in  a  splitting  scheme.  Take  2tn  cycles  at  iteration  n,  and  let  r^,  hj,  and 
Sj  be  defined  as  for  (25).  Then,  an  unbiased  estimator  of  (8)  aX  0  =  On  is 


in  /J 
3=1 


-  fi2jS2j-l'i'2j-l)  +  ^2j‘’‘2j-l 


(27) 


Since  this  estimator  is  unbiased,  t„  can  be  taken  constant  in  n,  (e.g.  !»  =  1  foe  ^ll  n). 

The  estimators  (22),  (26)  and  (27)  can  be  integrated  into  the  SA  algorithm.  The  following 
proposition  tells  us  about  the  a.s.  convergence  of  such  a  scheme. 


PROPOSITION  5.  Let  Assumptions  A-C  hold. 

(a)  Suppose  one  uses  SA  toith  the  LR  estimator  (22).  If  Sn  €  S  for  all  n,  — ►  oo,  and 

fn"'*  <  oo,  tf^en  0n  -►  0"  a.s.. 

(b)  If  one  uses  SA  with  the  regenerative  LR  estimator  (26)  and  t„  —►  oo,  then  0n  — ►  a.s.. 

(c)  If  one  uses  SA  with  the  estimator  (27),  then  0^  -*■  0*  a.s..  ■ 
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3.4.  Infinitesimal  Perturbation  Analysis  (IPA) 


The  basic  idea  of  IPA,  applied  to  our  context,  is  to  estimate  ,  s)  by  the  sample  derivative 

t 


h[id,s,u)  =  '£6i. 


i=l 


where  u  is  interpreted  as  the  sequence  Ui.Ui, . . .,  defined  before  Assumption  A, 


_  (Uj)  _  Y 


J=V, 


ae 


(28) 


(29) 


and  V,  is  the  first  customer  with  index  >  1  in  the  busy  period  to  which  customer  i  belongs.  That 
is,  Vj  =  max{j  1  1  <  J  <  »  and  Wj  —  0}  if  that  set  is  non-empty,  v,  =  1  otherwise.  For  further 
details  and  justifications,  see  Glasserman  (1991)  or  Suri  (1989).  Then, 

Yn  =  C'idn)  +  /i;(«„,S„,U-)/tn,  (30) 


which  can  be  computed  easily  during  the  simulation.  The  sum  (29)  is  called  the  IPA  accumulator. 
Observe  that  imposing  v,  >  1  means  that  we  consider  only  the  service  time  perturbations  of  the 
customers  who  left  during  the  current  SA  iteration.  In  other  words,  (28)  assumes  that  the  IPA 
accumulator  is  reset  to  zero  between  iterations.  The  initial  state  Sn  of  iteration  n  can  be  either  0 
for  all  n  (always  restart  from  an  empty  system),  or  be  the  value  of  (VF*  -  Vt)'*'  from  the  previous 
iteration  (for  n  >  1). 

We  can  consider  another  variant  of  IPA  in  which  the  IPA  accumulator  is  not  reset  to  zero 
between  iterations.  In  that  case,  both  s„  and  the  initial  value  a„  of  the  IPA  accumulator  are  taken 
from  the  previous  iteration.  The  value  of  a„  is  the  value  of  it  from  the  previous  iteration  if  5„  >  0, 
and  is  0  otherwise.  It  must  be  considered  as  part  of  the  “state”.  For  this  IPA  variant,  (28)  must 
be  modified  to:  ^ 

h[i0,s,a,u;)  =  ak;  +  j2^i,  (31) 

«=i 

where  a  is  the  initial  value  of  the  IPA  accumulator  and 


fc*  =  min(t,  min{t  >  0  |  =  0}). 


(32) 


represents  the  number  of  customers  in  the  current  iteration  who  are  in  the  same  busy  period  as  the 
last  customer  of  the  previous  iteration,  when  IFi  =  s  >  0. 

As  for  LR,  we  can  also  construct  regenerative  IPA  estimators.  With  s  =  0,  the  value  of  (28)  for 
the  first  cycle  is 


h;(0,O,u;)=^^Z,. 

•=1  j=l 


(33) 


With  r  cycles,  let  tj  and  h'j  denote  the  respective  values  of  r  and  /»l^(fl,0,w)  for  the  j-th  cycle.  An 
estimator  of  w\9)  is  then 

r,=ir/ 


(34) 


11 


At  iteration  n,  take  r  =  t„  regenerative  cycles  and  let  T„  =  5Zj=i  '’’i-  This  yields 


Yn  =  C'(«n)  + 


(35) 


Unfortunately,  for  finite  tn,  this  estimator  is  biased  for  a'{6„).  To  better  exploit  the  regenerative 
structure,  one  can  estimate  (7)  instead  of  (5),  using  IPA.  This  was  suggested  in  Fu  (1990).  iFrom 
Proposition  11  and  (6),  £#[/iT(fl,0,a>)]  =  l{B)v}'{6)  =  u'(tf)  —  w{S)t{0),  so  that  IPA  provides  an 
unbiased  estimator  of  (7)  based  on  a  single  regenerative  cycle.  Using  cycles  at  iteration  n  and 
averaging  out  yields  the  following  estimator: 

Yn  =  ri^if^i  +  r,C'(0n)).  (36) 

j=i 

As  pointed  out  by  Fu  (1990),  proving  a.s.  convergence  of  SA  to  the  optimizer  is  relatively  easy  with 
(36),  because  it  is  unbiased  for  (7)  a.t  0  =  0„  for  any  in-  With  (30)  and  (35),  it  is  more  difficult. 

Heidelberger  et  al.  (1988)  argue  that  (28)  divided  by  f  is  a  consistent  estimator  of  w'(0)  for  a 
rather  general  class  of  GI/G/1  queues,  and  give  a  proof  for  the  M/G/1  case.  To  prove  convergence 
of  SA  using  Proposition  1,  what  we  need  is  not  convergence  of  (28)  divided  by  t  to  w'(0)  a.s.  (as 
t  — ►  oo),  but  convergence  in  expectation,  uniformly  over  0.  In  fact,  both  kinds  of  convergence,  as 
well  as  variance  boundedness,  follow  from  Propositions  9-11.  Proposition  9  also  shows  that  the 
IPA  estimator  (28)  is  unbiased  for  w'f(0,  s)  under  Assumption  A.  This  leads  to: 


PROPOSITION  6.  Let  Assumptions  A  and  C  hold. 

(a)  Suppose  one  uses  SA  with  the  IPA  estimator  (30).  If  Sn  &  S  and  a„  =  0  for  all  n,  and 
tn  — ►  00,  then  0n  -*  0*  a.s.. 

(b)  If  one  uses  SA  with  the  estimator  (35),  with  — ►  oo,  then  0^  — >  0'  a.s.. 

(c)  If  one  uses  SA  with  the  estimator  (36),  then  0n  0“  a.s..  | 

If  On  is  not  reset  to  0  between  iterations,  proving  Proposition  6  (a)  appears  more  difficult,  but  we 
believe  that  the  result  still  holds.  Proposition  6  (c)  corresponds  to  the  result  of  Fu  (1990). 

For  this  GI/G/1  example,  IPA  has  the  stronger  property  that  even  when  using  a  truncated 
horizon  tn  that  is  constant  with  n,  if  the  IPA  accumulator  o„  is  not  reset  between  iterations  and 
under  mild  additional  assumptions,  SA  converges  to  the  optimizer.  On  the  other  hand,  if  Un  is 
reset  to  zero  at  the  beginning  of  each  iteration,  then  we  have  the  same  problem  as  with  FDC. 
By  keeping  the  value  of  a^  across  iterations,  the  estimator  takes  into  account  the  perturbations 
on  the  service  times  of  the  customers  who  left  during  previous  iterations.  It  is  true  that  the 
structure  of  the  busy  periods,  and  (in  general)  the  individual  terms  of  the  sum  (28),  could  depend 
on  0,  which  changes  between  iterations.  But  as  0„  converges  to  some  value,  that  change  becomes 
negligible  under  appropriate  continuity  assumptions.  (In  the  present  GI/G/1  context,  the  Zj's 
are  in  fact  independent  of  0,  but  not  the  v,’s.)  With  this  intuitive  reasoning,  we  should  expect 
SA-IPA  to  converges  to  0*  for  whatever  tn-  Proposition  7  states  that  this  is  effectively  true  under 
a  few  (sufficient)  conditions.  Here,  we  cannot  use  Proposition  1  because  /?„  0.  Instead,  we 
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give  a  weak  convergence  proof  by  verifying  the  assumptions  of  the  main  theorem  of  Kushner  and 
Shwartz  (1984).  Using  different  proof  techniques,  Chong  and  Ramadge  (1992a,  1993)  have  shown 
a.s.  convergence  (which  is  stronger)  for  SA-IPA  with  constant  truncated  horizon.  On  the  other 
hand,  with  the  regenerative  IPA  estimator  (35),  SA  does  not  converge  to  8*  in  general  if  A 


PROPOSITION  7.  Consider  the  SA  algorithm  with  IPA,  under  Assumptions  A-C,  with  {7„,  n  > 
0}  satisfying  W4  of  Appendix  I,  and  constant  truncated  horizon  tn  =  t.  Let  the  interarrival  time 
distribution  have  a  bounded  density.  Suppose  that  Zi  can  be  expressed  ns  Zi  =  where 

<p  :  Q  X  — »  R  is  a  function  such  that  <»?(•,  C)  *■*  continuous  for  each  C  >  0  (and  is  not  expressed 
as  a  function  of  Ui).  Suppose  also  that  the  IPA  accumulator  is  not  reset  to  0  between  iterations. 
Then,  converges  in  probability  to  the  optimum  6“ .  | 

4.  Conclusion 

Through  a  simple  example,  we  have  seen  how  a  derivative  estimation  technique,  such  as  FD, 
IPA,  or  LR,  can  be  incorporated  into  a  SA  algorithm  to  get  a  provably  convergent  stochastic 
optimization  method.  In  the  companion  paper  (L’Ecuyer,  Giroux,  and  Glynn  1993),  we  report 
numerical  investigations  and  point  out  some  dangers  associated  with  different  kinds  of  bias.  The 
performance  of  these  algorithms  when  there  are  many  parameters  to  optimize,  the  incorporation 
of  proper  variance  reduction  techniques,  the  study  of  convergence  rates,  and  comparisons  between 
SA  and  the  stochastic  counterpart  approach  (Rubinstein  and  Shapiro  1993),  are  other  interesting 
subjects  for  further  investigation.  In  principle,  IPA  and  LR  can  be  used  to  estimate  higher  order 
derivatives,  but  the  variance  is  likely  to  be  high.  Is  it  too  high  to  permit  the  implementation  of 
good  second  order  algorithms  based  on  these  estimates  ?  Again,  further  investigation  is  needed. 

The  convergence  results  of  §3  can  be  extended  to  more  general  models  than  the  GI/G/1  queue. 
Consider  for  example  a  general  discrete-time  Markov  chain  model  parameterized  by  6.  Let  Wt{0,  s)/t 
be  the  expected  average  cost  per  step  for  the  first  t  steps,  if  the  initial  state  is  s  (we  have  removed 
C{9)).  Suppose  that  (12-13)  hold  (which  implies  that  the  derivative  exists),  that  w{0)  is  strictly 
unimodal,  and  that  an  unbiased  LR  or  IPA  derivative  estimator  for  Wt{0,s)  is  available.  If  the 
variance  of  the  LR  estimator  is  in  0{t),  then  Proposition  5  (a)  applies,  while  if  the  variance  of  the 
IPA  estimator  is  in  0(l/t),  then  Proposition  6  (a)  applies.  Further,  if  the  system  is  regenerative, 
and  if  unbiased  LR  estimators  are  available  for  l’{0)  and  u'{0),  then  one  can  construct  estimators 
for  ir'(d„)  and  ^^(fl„)ti;'(®„)  as  in  (26)  and  (27).  If  those  estimators  have  bounded  variances  and 
converge  in  expectation  uniformly  in  0,  as  tn  -*  oo,  then  Proposition  5  (b-c)  applies.  If  a  FD  or 
FDC  estimator  is  used  and  if  u;(-)  and  Wt{-,s)  are  continuously  differentiable  (for  each  s),  then 
Proposition  3  applies.  All  of  this  generalizes  straightforwardly  to  the  case  where  ^  is  a  vector  of 
parameters.  Derivatives  are  then  replaced  by  gradients. 

Appendix  I.  Sufficient  Convergence  Conditions 

In  this  appendix,  we  prove  Proposition  1  and  give  a  second  set  of  sufficient  conditions,  which  imply 
weak  convergence  of  the  SA  algorithm  (9)  to  9*. 
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PROOF  of  Proposition  1.  For  each  n,  the  sequence  {23i=n  j  >  1}  is  a  martingale.  For 
each  €  >  0,  from  Doob’s  inequality  and  from  S3,  we  have 


P 


]£7.'^o[fn  <  ^£(7./^.)^- 


for  some  constant  K.  This  upper  bound  goes  to  zero  as  n  — ►  oo.  Hence,  we  obtain  condition 
i42.2.4"  of  Kushner  and  Clark  (1978),  and  the  result  then  follows  from  their  Theorem  5.3.1.  ■ 

Often,  0  ■/*  0,  but  Eo{0n\  — ►  0  as  n  — ►  oo,  and  the  algorithm  converges  as  well  to  the  optimum. 
This  is  addressed  by  the  following  (weaker)  result,  which  follows  from  the  results  of  Kushner  and 
Shwartz  (1984)  and  by  adapting  the  proof  of  the  second  part  of  Theorem  4.2.1  in  Kushner  and 
Clark  (1978)  (note  that  in  the  last  paragraph  of  the  latter  proof,  the  max  should  be  replaced  by 
a  min).  We  now  restate  the  assumptions  of  Kushner  and  Shwartz  (1984),  with  slight  adaptations. 
See  the  latter  reference  for  further  details. 


Wl.  Denote  =  (y„,  s„+i)  €  R  x  5.  Assume  that  Pj,,  is  weakly  continuous  in  {6,  s),  m  the  sense 
that  =»  Pjo.jo  when  (0,s)  — ►  (tfo»«o)»  that  F[F„+c  I  (^n>«n)  =  (^»"»)]  is  continuous 
in  (^,  s),  for  some  integer  c  >  0.  Assume  that  for  each  fixed  B  €  Q,  i.e.  if  7„  =  0  for  all 
n,  ^  1}  is  a  Markov  process  with  unique  invariant  measure  P*  and  corresponding 

mathematical  expectation  E^.  Denote  v{0)  =  E^{Yn)-  Let  {P^,B  G  0}  and  {^„,n  >  1}  be 
tight  (the  latter  uniformly  over  B  and  s;  see  Kushner  and  Shwartz  1984). 

W2.  For  each  compact  C  C  R  x  5,  there  is  an  integer  nc  <  oo  such  that  for  each  T  >  0,  the  set 

of  probability  measures  {P[(^„+3,4„+j_i)  €  •  i  €  0,  ^  €  C,  n  >  1,  j  > 

"C)  7«  ^T,  C  compact  subset  of  5}  is  tight. 

W3.  For  some  constant  «  >  0,  sup„>i  <  oo. 

W4.  7„  >  0  for  all  n,  lim„-.oo  7n  =  0,  J2n=\  7n  =  oo  and  |7„+i  -  7„|  <  oo. 

W5.  The  function  v  is  nondecreasing  in  0  and  has  a  unique  root  at  B’  €  0- 

THEOREM  1.  Under  Wl — W5,  — ►  B'  in  probability,  i.e.  for  each  c  >  0,  lim„_oo  P(||^n  - 

®*ll  ^  f)  =  0-  Also,  v{B)  is  continuous  in  B.  ■ 

Appendix  II.  Convergence  Proofs 

In  this  Appendix,  we  prove  that  under  our  assumptions,  LR  and  IPA  provide  unbiased  estimators 
for  w[{B,  s).  We  obtain  variance  bounds  for  these  derivative  estimators  and  for  their  regenerative 
counterparts,  which  are  asymptotically  unbiased  and  converge  in  quadratic  mean,  uniformly  in  B. 
We  also  show  that  wt{-,s)  and  w{-)  are  continuously  differentiable  and  that  (12-13)  hold.  We  then 
prove  Propositions  3  to  7. 
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PROPOSITION  8.  Let  s  ^  S  be  the  initial  state  and  t  be  the  number  of  customers  served  before 
the  system  empties  out  for  the  first  time.  Let  ht(6,s,u;)  be  defined  as  in  (28).  Under  A  (i),  there 
exists  finite  constants  K(,  Kr,  and  Kh,  all  >  1,  such  that 


sup  Eg  [c®] 
«€©o  •' 

VI 

(37) 

sup  Eg,,  fr®] 
*€S,  tfee® 

<  Kr; 

(38) 

sup  [(/>r(tf,3,‘*>))^] 

j€S,  fled®  ^ 

<  Kh. 

(39) 

Under  A  (i-ii),  there  also  exists  a  finite  constant  >  1  such  that 

sup  Eg,,  [(/i',(^.s.‘*'))^ 

VI 

(40) 

MeS,  «ed<> 


PROOF.  iFrom  A  (i),  Eg[C^]  <  .E[C®]  A'c  <  <».  which  gives  (37).  iFrom  Theorem  III.3.1 
(i)  in  Gut  (1988),  page  78,  one  has  £(r*  |  s  =  c]  A’t  <  oo.  Now,  let  C»  =  and  f  be 

the  respective  values  of  Ci  and  r  obtained  if  B$  is  replaced  by  B  while  w  =  {U\,U2,  ■  •  •)  remains 
the  same.  One  has  ^  0-  But  increasing  any  service  time  or  increasing  s  cannot  decrease  the 
number  of  customers  in  the  first  busy  cycle.  Therefore,  t  is  stochastically  dominated  by  f ,  which 
is  itself  stochastically  non^decreasing  in  s.  ^From  basic  stochastic  ordering  principles  (Wolff  1989), 
this  implies  (38). 

For  each  t,  one  has  Wi  <  s  +  Ci>  so  fo*'  f  ^  =  52!=i(^»  +  C)  S 

t  (s  +  Ci)-  Recall  that  if  ib  is  any  (integer)  power  of  two,  then  (i  +  y)*'  <  2*"^!*  +  y*)  (easy 
to  check  by  induction  on  k).  Therefore, 


(h,(<l,s,u;))'' 


(41) 


This  holds  in  particular  for  f  =  r  and  k  =  4.  Then,  from  Theorem  1.5.2  in  Gut  (1988),  page  22 
(used  in  the  third  inequality),  there  is  a  constant  Ki  <  oo,  independent  of  0  and  s,  such  that 


<  8s*Eg^[T*]  +  8£;j,,[r»]  +  8E9,, 


i‘)' 


<  8s*Ee,,[r*]  +  8£;^,.[r*]  +  8Kx{Es,,[T^]Ee,,[(^]fl^ 


<  8{c*Kr4-Kr4-KiKrK(^)  Kk  <  00. 


iFrom  (28),  for  any  k  >  1,  one  has 


{h\(e,sMf  < 


(42) 
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Again,  since  this  holds  for  t  =  r  and  *:  =  2,  and  using  Theorem  1.5.2  of  Gut  (1988),  there  is  a 
constant  K2  <  oo  such  that 

<  Ee,, 

<  EsAA  +  K2iE0,,[r*mZt])^^^ 

<  Kr  +  KjKrKr  **=  a;  <  00.1 

PROPOSITION  9.  Under  Assumption  A,  for  each  s,  Wt(-,s)  is  differentiable,  and  for  each 
{0,s)  6  0x5,  (28)  is  an  unbiased  estimator  of  w[{6,s). 


PROOF.  For  fixed  w,  ht{6,s,uj)  is  continuous  in  each  and  therefore  continuous  in  0  from  A 
(ii).  It  is  also  differentiable  in  0  everywhere  except  when  two  events  (arrival  or  departure)  occur 
simultaneously,  which  happens  at  most  for  a  denumerable  number  of  values  of  0  for  almost  any 
(fixed)  w.  Also,  for  any  fixed  0,  this  happens  with  probability  zero.  Using  (42)  and  (37),  one 
obtains,  for  all  0  €  0°, 


Ee,,  [(fi;(fl,s,a;))2 


<  t*E6[Zl]  <  t*KT. 


Since  every  €  0  has  a  neighborhood  contained  in  0°,  the  conclusion  then  follows  from  Theorem 
1  in  L’Ecuyer  (1990).  ■ 


We  now  show  that  the  mean-square  error  of  ht(0,s,u)/t  as  an  estimator  of  w{0),  as  well  as  the 
meam-square  error  of  h[{0,s,u)/t  as  an  estimator  of  w'{0),  are  in  the  order  of  1/1,  uniformly  in 
(0,  s)  €  0  X  S.  As  a  consequence,  the  variance  and  squared  bias  of  these  estimators  also  decrease 
uniformly,  linearly  in  1.  The  uniform  convergence  properties  (12-13)  will  follow  from  that. 


PROPOSITION  10.  Under  Assumption  A  (i),  there  is  a  finite  constant  K,  such  that  for  all 


t>l, 


sup 

*eS,  tfe© 


ht{0,s,u) 

t 


(43) 


Under  Assumption  A,  w(0)  is  differentiable  everywhere  in  0  and  there  is  a  finite  constant  K'^  such 


that  for  all  t  >  1, 


sup 

*€5,  «€© 


(44) 


PROOF.  We  adopt  the  convention  that  the  j-tb  busy  cycle  ends  when  the  system  empties  out 
for  the  j-th  time.  When  s  0,  the  first  “busy  cycle”  does  not  obey  the  same  probability  law 
than  the  others,  but  all  the  busy  cycles  are  nevertheless  independent.  For  j  >  1,  let  Tj  be  the 
number  of  customers  in  the  7-th  busy  cycle,  hj  the  total  sojourn  time  of  those  Tj  customers,  and 
Aj  =  hj  -  w{0)Tj.  For  j  >  2,  one  has  Ee[Aj]  =  0.  iFfo™  Proposition  8,  w\0)  <  Efllhj]  <  Kh 
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and  EaM]]  <  +  K^rJ]  <  A'fc(l  +  Kr)  for  aU  >  >  1.  Let  M(t)  =  6up{i  >  0  |  Y:)=i  <  0 

and  A(t)  =  rj.  Since  the  Aj's  are  independent  and  have  zero  expectation  (for  the  fourth 

inequality),  applying  Wald’s  equation  (for  the  fifth  inequality),  and  observing  that  M{t)  <  t  (for 
the  sixth  inequality),  one  obtains: 


■pE,.. 


y>- 


«w«)+l  \  A(0  \ 

/M(t)+1  2 


E  E  + 

i,j  =  l  j=l 

M(0+1 

£  +  u>^(«)rj') 

.  1=1 


< 

< 


- 


[Ee,s  [a]  +  h]  +  u;2(«)r?]  +  EB,o[Mit)]Ee,o[Al  +  /i^  +  u,2(<?)r2]] 

(i  +  t)(2A:fc(i  +  ii:o)  < 


which  proves  (43),  with  Ke  =  8A’/i(l  +  Kr). 

To  prove  (44),  let  h'  denote  the  sum  of  the  i,’s  associated  with  the  Tj  customers  of  the  j-th 
busy  cycle,  w'{0)  =  i^s[/i2]/f^«[r2],  and  A'j  =  h'^  —  w'(0)rj.  One  has  Ee\A'j\  =  0  for  j  >  2  and,  from 
Proposition  8,  (w'(fl))^  <  f^s[(h2)*]  <  K'f^  (since  r  >  1)  and  f;j,,[(>l')*]  <  EB^,[(h!jY  +  ^'hT]]  ^ 
Ar^(l  +  Kr)  for  all  j  >  1.  Then,  from  the  same  reasoning  as  above,  with  W*  replac^  by 


E9. 


fh\{9,s,i^)  ^  SK’,{1  +  Kr)  drf  K 


It  remains  to  show  that  w'{0)  =  w\6).  Using  Proposition  9  for  the  first  equality  and  the  expected- 
value  version  of  the  renewal-reward  Theorem  (Wolif  1989))  for  the  second  one,  one  has 


Furthermore, 


ita. 

1— »oo  t  <—*00  t 
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Then,  since  0  is  compact,  u;J(-,s)/t  is  integrable  over  0.  From  the  Fundamental  Theorem  of 
Calculus  (e.g..  Theorem  8.21  in  Rudin  1974),  one  has 

t  t  Js^i.  t 

Taking  the  limit  as  t  — ►  oo,  using  the  dominated  convergence  theorem  to  interchange  the  limit  with 
the  integral,  and  then  differentiating,  yields 

tn'(tf)  =  lim  tnj(tf,s)/t=  ■ 

<— *00 


PROPOSITION  11.  Under  Assumption  A,  (12- IS)  hold. 


PROOF.  Equation  (12)  follows  easily  from  (43)  and  the  fact  that 


-u,(d))'  = 


ht{8,s,uj) 


< 


ht{9,s,u) 

t 


The  proof  of  (13)  is  similar.  ■ 


PROPOSITION  12.  For  a  given  regenerative  cycle,  let  r  6e  the  number  of  customers  in  that 
cycle  and  h'^(6,Q,u)  as  in  (33).  Under  Assumption  A,  one  has 


w'{e) 


EelhU9,0,u:)] 

Es[r] 


(45) 


PROOF.  This  has  been  shown  in  the  proof  of  Proposition  10.  I 


PROPOSITION  13.  Under  Assumption  A,  asr  —*  oo,  (34)  has  bounded  variance  and  converges 
in  quadratic  mean  to  w'{0),  uniformly  with  respect  to  9,  for  6  0. 


PROOF.  Define  A'j  =  h'-  -  w'(9)Tj.  As  seen  in  the  proof  of  Proposition  10,  £s[A']  =  0, 
Ee[{A'jf]  <  +  Kr),  and  EslA'^A'i]  =  0  for  j^i.  Then, 


Ee 


^2 


-wX9) 


=  Eg 


<  Eg 


As  r  goes  to  infinity,  this  converges  to  zero  uniformly  in  9.  This  also  provides  a  uniform  upper 
bound  on  the  variance  of  (34).  | 


PROPOSITION  14.  Consider  the  truncated  horizon  LR  derivative  estimator  (20),  under  As¬ 
sumption  B.  Then,  for  each  Oq  £  Q,  there  is  a  neighborhood  T  of  9o  such  that  for  all  9  £  T  and 
s  £  S,  (20)  is  an  unbiased  estimator  of  w\(9,s)  with  finite  variance.  Further,  each  Wt{-,s)  is 
continuously  differentiable  over  0. 
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PROOF.  Here,  for  fixed  u>,  ht(fi,3,Lj)  does  not  depend  on  0.  Since  each  b«(Ci)  assumed 
differentiable  in  0,  A2  (a)  in  L’Ecuyer  (1993'  is  satisfied.  Let  if  >  1,  cq  >  0,  and  9  satisfy  B  (in). 
Then,  using  (41), 


Ee 


|fl-ffol<«o  V  }  Itf-Sok^o  V°9(C)/ 


<  Eg  sup 


/ MC)  |;^»(C)y  +  A"* 


VMC)  h,{0  )  J 


+  +  2t*£:i 


(s4 


<  Ea 


sup 

|tf— flol<to 


(^InMo)'  +A'*  +  2tV  +  2t%(C') 


<  oo. 


The  conclusion  the.;  follows  from  Proposition  2  of  L’Ecuyer  (1993)  (with  q  —  bg).  ■ 


PROPOSITION  15.  Under  Assumptions  B  (i-iii),  ti(9),  i(9),  and  w(9)  are  finite  and  contin¬ 
uously  differentiable  in  9,  for  9  £  Q.  Also,  ^,^(9,u)  and  il3t{9,u),  defined  in  (23)  and  (24),  are 
unbiased  estimators  of  u'{9)  and  l'{9),  respectively,  for  9  in  a  small  enough  neighborhood  of  9o. 


PROOF.  We  first  prove  the  second  part  of  the  proposition  and  for  that,  we  will  use  Proposition 
3  of  L’Ecuyer  (1993).  For  fixed  u,  r  and  Yii=i  depend  on  9.  Therefore,  r,  51^=1 

and  each  6s(Ci)  are  differentiable  in  9  everywhere  in  0°.  This  implies  A2  (a)  in  L’Ecuyer  (1993) 
with  t  replaced  by  r.  ^From  B  (i),  there  is  an  s  >  0  such  that  for  all  s  <  5,  E[e*^]  <  oo.  Then, 
from  Theorem  III.3.2  in  Gut  (1988),  page  81,  there  is  an  ti  >  0  such  that  £[e‘>’’]  <  oo.  Let 
0  <  A’  <  €o,  and  9  satisfy  B  (iii).  One  has 


T 


n 


sup 

|S-*ol<«o 


< 


< 


e 


«lT 


and,  by  a  similar  stochastic  ordering  argument  as  in  the  proof  of  (38), 


Eg  [a:*’’]  <  A,-[e“^]  <  A[e'»n  <  oo. 


iFrom  Wald’s  equation  and  B  (in), 


Ee 


[j=l  l^-^oK^o 


< 


Eg[T]Eg 

Eg{T\Eg 


sup 

|«-flol<tO 


K*  sup 
|S— (ol<co 


<  oo. 


Then,  from  (39),  all  the  requirements  of  A3  in  L’Ecuyer  (1993)  are  satisfied,  with  h(9,uj)  there 
replaced  by  either  r  or  J^-i  W*  (which  here  do  not  depend  on  9  for  u  fixed),  and  ri,(C)  =  A”®. 
This  holds  in  a  neighborhood  of  9q  for  each  €  0-  This  implies  the  result,  except  for  the  continuous 
differentiability  of  w,  which  follows  from  (6)  and  the  continuous  differentiability  of  u  and  1.  I 


PROPOSITION  16.  Under  Assumptions  B,  sup^g©  Ee[^]  <  oo. 
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PROOF.  Let  K  >  1.  From  B  {«v),  for  each  Bq  €  0,  there  is  an  open  interval  T(tfo)  =  {Bo  — 
fo,Bo  +  <o).  a  €  ©,  and  a  constant  K{Bo)  <  cx>  such  that 


It  follows  that 


sup 

«eT(flo) 


<  K{Bo). 


sup  = 

tf€T(»o) 


< 


sup  Eg 
tf€T(eo) 


bsiO 

bgiO 


KEs 


sup 
L«€T(«o) 


<  Kk{Bo). 


Now,  {T(5o)i  Bo  €  0}  is  a  family  of  open  sets  that  covers  0.  Since  0  is  compact,  there  is  a  finite 
subset  of  that  family,  say  . . . ,  that  covers  0,  and  one  has 


sup£'tf[df]  <  max  Kk(B^’^)  <  oo.  § 
0^6  '<*<^ 


PROPOSITION  17.  Consider  the  LR  estimator  (20).  Under  Assumption  B, 


sup 

«€©,  »<c.  «>i 


^^(B,  SyU) 

<3 


<  00. 


PROOF.  Since  <  oo  (Proposition  8),  from  §VIII.2  of  Asmussen  (1987),  since  £'«,#[(1F’)^)  - 
^«,c[(W^*)^]i  and  from  Proposition  16,  there  exists  a  constant  Kd  <  oo  such  that 

sup  +  df]  <  Kd. 

«€d,  »<c,  •>! 

Recall  that  E[dj]  =  0  and  that  the  dj's  are  independent.  Then, 


t  t 


1/2 


<  t*  sup  £«..[(IF.*)^]  x:  E 


l<i<t 


i=l  j=l 


<  [t*Kdt‘^KdYf^  =  t^Kd.n 


PROPOSITION  18.  Suppose  that  Assumption  B  holds.  Then,  as  r  -*  oo,  the  regenerative  LR 
estimator  (25)  has  bounded  variance  and  converges  in  quadwtic  mean  to  w'(0),  uniformly  with 
respect  to  6  in  Q. 


PROOF.  ^From  Proposition  8,  £s(r*]  <  Kr  and  Es  l(f»T(^,0,u>))*]  <  K/t.  i,From  Theo¬ 
rem  1.5.2  in  Gut  (1988),  there  is  a  constant  K,  independent  of  B,  such,  that  1  <  A”*  <  oo 


and  £,[(5r(fl,0,w))«]  =  £9  [(E;=i  <  A'..  Let  K  =  max(A'r,  A"/,,  AT.).  Define  = 

hjSj  -  w{0)TjSj  -  v}'{0)Tj  and  Ajj  =  hj  -  w{d)Tj.  Note  that  Eb[A\j]  =  E0[A2j]  =  0,  since 
w{0)  =  and  =  (EalhjSj]  -  w(0)EalrjSj})/E0[rj]  (from  Proposition  15).  Also, 

since  fJsfrj]  >  1  (used  in  the  first  two  lines),  one  has 


w(0} 

w'(0) 


Eg 


< 

< 

< 

< 

< 

< 

< 

< 

< 


Ealhj]  < 


EalhjS,]  -  w(0)E0[r,Sj] 

(E0[h*]y/*(E0lSfjy'^  +  K^/*(E0[rf]E0[S^]y/^ 

A"3/*  +  A'‘/^  < 

2E0lh]S^^]  +  4w^(0)E0[rfS^]  +  4(u<'(tf))2£4r/] 
2(£0lhj]y/^(E0lSf]y^^  +  4A'»/2(£4rf]£«[S«])‘/‘‘  +  16A'(f;9[r«])>/'‘ 
2K^I^  4- 4K  4- <  22A'3/^ 

Eaiih,  -  u,(«)r,)^l 

8£9[/iJ]  +  8{w{0)yEB[T*] 

8A' +  8A' •  A"*/*  <  16A'3/*, 


<  ^EeiAty  <  leA-^/Vr'. 


Keeping  in  mind  that  Tj  >  1  and  EsfAij]  =  0  for  each  j,  one  obtains 


<  Eb 


<  2E0 


;  ( -  ■»'(») i;r,  +  (Er,')  fw(») t--.  -  E'.,')  t'A 
^  ;=i  j=i  \7=i  /  V  ;=i  /  r=i 


<  lEB[A\y  +  2i^^^EB[r*S*^' 

<  ~(22K^I*4-l6K^f*K^I'^)  < 

T 

As  r  — ►  00,  this  converges  to  zero  uniformly  in  6.  These  inequalities  also  provide  a  uniform  upper 
bound  of  76A'*/^/r  on  the  variance  of  (25).  I 


PROPOSITION  19.  Under  Assumptions  A  (i)  and  C,  for  each  t  >  1  and  s  €  S,  uj{6)  and 
v}t(9,  s)  are  convex  in  0  over  0. 

PROOF.  Since  C»  =  Bg^(Ui)  and  from  (2),  each  Wi  and  Wf  are  convex  in  0  (for  fixed  Ui's). 
Therefore,  for  each  («,<)»  tn«(®,3)  is  convex  in  0.  This  implies  that  u»(tf)  =  limt-.oo  v>t{0,s)/t  is  also 
convex  in  0.  iFrom  Assumption  C,  it  follows  that  a(^)  is  convex.  ■ 
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PROPOSITION  20.  Suppose  that  Assumptions  A-C  hold,  that  the  system  was  ongtnally  started 
from  state  J  =  0,  and  that  the  service  time  of  the  j-th  customer  overall  has  distribution  with 
€  ©  (the  ’s  can  be  different  and  might  even  be  the  values  taken  by  correlated  random  variables, 
provided  that  these  values  are  in  0J.  Let  v,  be  defined  as  in  (28)  and  K'^  be  as  in  Proposition  8. 
Then, 

'  1  *+t  •  ' 

7  E 


sup  E 

*>o,  t>i 


t=k+l  j=v. 


<K- 


(46) 


Here,  E  denotes  the  expectation  associated  with  the  above  sequence  of  Bj 's  and  we  assume  that  it 
is  well  defined.  [Note  that  here,  we  do  not  assume  that  €  S.) 


PROOF.  Suppose  first  that  all  the  service  times  follow  the  distribution  B.  Then,  the  queue  is 
stable  (Asmussen  1987,  Chapter  VIII).  Let  r  be  the  number  of  customers  in  a  regenerative  (busy) 
cycle,  let  6i  —  Yl)=v,  view  for  the  moment  6[  as  a  “cost”  associated  with  customer 

i.  The  expected  “cost”  over  a  regenerative  cycle  is  then,  using  the  same  argument  as  in  the  proof 
of  (40)  and  assuming  that  s  =  0, 


E*? 

1=1 


<  E 


1=1  / 


<  K- 


i,From  the  renewal- reward  theorem  (Wolff  1989),  one  then  has 


.iL”  ^ 

fsl 


E<? 


Li=l 


/£:[r]  ■*=  k  <  Kl 


(47) 


We  will  now  show  that  is  stochastically  non-decreasing  in  i.  For  fixed  values  of  Ci,  u,,  and  r,, 
v,+i  is  a  non-increasing  function  of  W,,  and  it  is  easily  seen  that  (W,+i,i,+i)  is  a  non-decreasing 
function  of  (lVi,S,)  for  any  value  of  U,+i.  Since  {Wi,i|)  =  (0,r(t^i))  while  (^2,62)  >  (0,r(I/2)), 
it  follows  that  (IVi,ii)  is  stochastically  dominated  by  (^2,^2)  by  induction  on  t,  that  (W,,6,) 
is  stochastically  non- decreasing  in  i.  Then,  is  non-decreasing  in  »,  and  from  (47),  it  follows 
that  <  lim,^oo  =  K  < 

We  will  complete  the  proof  using  stochastic  ordering  arguments  similar  to  those  used  in  the 
proof  of  Proposition  8.  For  fixed  Uj,  replacing  Be  by  B  for  customer  j  increases  (j  and  does  not 
affect  the  other  service  and  interarrival  times.  Clearly,  increasing  a  service  time  can  never  split  a 
busy  period,  i.e.  can  never  increase  any  v,.  Therefore,  r  and  each  Si,  generated  under  Be  or  under 
the  assumptions  of  the  Proposition,  are  stochastically  dominated  by  r  and  6^  generated  under  B. 
This  implies  that  ElS^]  <  k[S[]  <  K'^,  where  E  is  the  same  as  in  (46).  The  expectation  in  (46), 
which  is  the  second  moment  of  the  average  of  Sk+i , . . . ,  ^t+t,  is  then  bounded  by  K/^.  I 

PROOF  of  Proposition  2.  ^From  Propositions  14  and  15,  Wt{-,s)/t  and  «;(•)  are  con¬ 
tinuously  differentiable  in  0  for  each  s  €  5  and  (  >  0.  Further,  the  continuity  of  w'(B)  with 
respect  to  B  is  uniform  in  B  over  0,  because  the  latter  set  is  compact.  Also,  from  Proposi¬ 
tion  11,  (13)  holds.  iFrom  Taylor’s  theorem,  one  has  -  «’'<„(®nian))/fn  for 

^  ^  •  Note  that  as  n  -♦  c»,  one  has  B*  -  B~  0  and  therefore  — *  0  a.s..  Then, 

"  «''(^n)]  +  "  tn'(«„))  +  (»'(»„)  -  «>;„(«»,  s„)/t„]  and  each  bracketed 
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term  converges  to  zero  a.s.,  uniformly  in  (^n.^n)*  from  (13)  and  from  the  uniform  continuity  of  w'. 

■ 

PROOF  of  Proposition  3.  iFrom  Proposition  10,  the  mean-square  error  of  ht{0,  s,ij)/t 
is  in  0(1/1),  uniformly  in  (9,s),  so  that  which  is  the  conditional  variance  of  K,,  is  in 

and  52^1  <  oo  a.s..  ^From  Proposition  11,  lim„^oo *=  0  and  the  result 

then  follows  from  Proposition  1.  ■ 

PROOF  of  Proposition  4.  ^From  Proposition  10,  the  r^’s  and  Aj’s  have  uniformly  bounded 
second  moments.  Therefore,  the  conditional  variance  of  V),  is  in  0{t~^c~^)  and,  since  f„  >  1  for 
each  n,  <  oo  for  0  <  q  <  1/2.  iFrom  Proposition  15  and  since  0 

is  compact,  u(  )  and  t(  )  are  continuously  differentiable,  uniformly  over  0.  It  is  then  easy  to  see, 
using  (16)  and  Taylor’s  theorem,  that  /?„  — ►  0  a.s.,  where  (3n  here  is  the  difference  between  (16) 
and  (8)  evaluated  At  8  =  9„-  The  result  then  follows  from  Proposition  1.  | 

PROOF  of  Proposition  5.  ^From  Proposition  14,  V><„(^n,  Sn.‘*'n)  is  an  unbiased  estimator 
of  w[^{8n,Sn),  so  that  =  0.  iFrom  Proposition  11,  we  know  that  — *  0  a.s.  when 

fn  0-  i.From  Proposition  17,  there  exists  a  constant  K'd  <  oo  such  that  <  Kdtn  for 

all  n.  Therefore,  ^  12T=:i  ^ dinn~^  <  oo.  The  first  result  then  foUows  from 

Proposition  1. 

For  (b).  Proposition  18  says  that  as  r  — •  oo,  (25)  has  bounded  variance  and  converges  in 
quadratic  mean  to  w'{8),  uniformly  in  0.  This  implies  uniform  convergence  in  expectation.  Then, 
limn-,.00  /^n  =  0  a.s.,  the  variance  of  Yn  in  (26)  is  uniformly  bounded,  and  Proposition  1  applies. 

For  (c),  one  has  /3„  =  0.  iFrom  Proposition  8  and  the  proof  of  Proposition  18,  Tj,  hj,  TjSj,  and 
hjSj  have  bounded  second  moments  for  each  j,  uniformly  in  Therefore,  the  variance  of  (27)  is 
bounded  uniformly  in  0n  and  the  result  follows  again  from  Proposition  1.  I 

PROOF  of  Proposition  6.  iFrom  Proposition  9,  h[^(0n,3n,Un)  is  an  unbiased  estimator  of 
u;J^(^„,s„),  so  that  0^  =  0.  iFrom  Proposition  11,  we  know  that  /?„  =  — ►  0  a.s.  when  —  0. 

iFrom  Proposition  10,  the  variance  of  h[^(0n,Sn,u)n)/tn  is  bounded  uniformly  in  0n  and  t„.  The 
first  result  then  follows  from  Proposition  1. 

For  (b).  Proposition  13  says  that  as  r  — ►  00,  (34)  has  bounded  variance  and  converges  in 
quadratic  mean  to  w'(8),  uniformly  in  0.  This  implies  uniform  convergence  in  expectation.  Then, 
limn->oo  0n  =  0  a.s.,  the  variance  of  Yn  in  (35)  is  uniformly  bounded,  and  Proposition  1  applies. 

For  (c),  0n  =  0  for  each  n.  iFrom  Proposition  8,  Tj  and  h'  have  bounded  second  moments.  So. 
the  variamce  of  (27)  is  bounded  uniformly  in  and  the  result  follows  again  from  Proposition  1.  ■ 

PROOF  of  Proposition  7.  We  will  verify  W1  to  W5  of  Appendix  I  and  the  result  will 
follow  from  Theorem  1.  For  this  proof,  we  will  redefine  differently  the  state  of  the  Markov  chain. 
Remove  the  restriction  s„  <  c  and  redefine  the  state  at  iteration  n  of  SA  as  s„  =  (xmUn)*  where 
Xn  is  the  sojourn  time  of  the  last  customer  of  iteration  n  —  1  (zj  =0),  and  On  is  the  value  of  the 
IPA  accumulator  at  the  beginning  of  iteration  n.  Here,  we  assume  that  the  arrival  time  of  the  first 
customer  of  an  iteration  is  “unknown”  (not  part  of  the  state)  at  the  beginning  of  the  iteration. 
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We  do  that  in  order  to  facilitate  the  verification  of  the  continuity  conditions  required  in  W 1 .  Let 
s  =  (x,o)  be  the  system  state  at  the  beginning  of  an  iteration,  k*  be  defined  as  in  (32), 


v*  =  afcr  +  E  E 

i=l  ;=«;, 

and 

^  yvt\  Hk:  =  t)a  +  . 

Here,  0*  is  the  value  of  the  IPA  estimator  (31),  while  the  other  two  components  of  ^  give  the  initial 
state  for  the  next  iteration.  At  iteration  n,  (fl,i,o)  =  (fln)Xn,fln)  and  ^  =  {„  =  (V)*,i„+i,an+i). 
Since  <„  is  fixed  at  t,  Pe,x,aiin  €  •)  does  not  depend  on  n. 

To  prove  the  weak  continuity,  let  g  :  — ►  R  be  continuous  and  bounded  in  absolute  value  by 
a  constant  Kg.  We  need  to  show  that  £fl,i,a[5(^)]  continuous  in  {9,x,a).  One  difficulty  is  that  for 
fixed  Uj's,  the  components  of  f  are  discontinuous  in  6.  To  prove  the  continuity  of  the  expectation, 
we  will  use  a  likelihood  ratio  approach.  Let  8q  £  Q,  K  >  1,  to,  and  0  be  as  in  B  (in).  Let  xo  >  0 

and  oo  >  0.  Let  us  view  u  as  (r/oiCi _ and  assume  that  u  is  generated  under  Pg.  Now, 

for  1^  -  flol  <  €o,  I  >  0  and  a  >  0,  define 

V  .=ij=v. 

-Aaoklo  +  i:  E  /(fcr.o  =  0ao+  E 

\  .=Xi=v.,o  >=v,,o  / 

where  k*Q,  Vj,oi  and  W*q  are  the  respective  values  of  k“,  v,,  and  W*  when  x  is  replaced  by  Xg 
while  w  =  (I'ojCti  •  •  •iX'e-iiCt)  remains  the  same.  Let  /(i,io)  =  1  if  v«,o  ^  for  at  least  one  i,  and 
/(i,io)  =  0  otherwise.  Note  that  /(i,io)  =  0  implies  that  k’g  =  k*.  Also, 


t 


P,-(/(i,Xo)  =  l)  <  ^Pg 


i=l 

t 


i=i 


<  |x  -  lol 


Ps 


=  E^^ 

«=1 

<  2tKu\x  -  xol, 


li/Q  -  xo  +  z]  <  lx  -  Xol 


E(‘"j  -  Cj)  =  z 

>=i 


where  Eg  integrates  over  the  values  of  z,  and  Ku  is  a  bound  on  the  density  of  the  interarrival  time 
vg.  Conditional  on  /(i,xo)  =  0,  A(^,x,a,u;)  is  continuous  in  (tf,x,a),  because  g  is  continuous, 
W*  is  continuous  in  x  and  does  not  depend  on  (^,o),  bg{Q  C)  are  continuous  in  B  for  each 
k^Q  =  jfc*,  and  Wi,o  =  v*  for  each  i.  Further,  lA(0,x,o,fa;)|  is  bounded  by  2KgK*  and  is  zero  when 
{B,x,a)  =  (^o.z^o.oo)-  Therefore, 


(S,x,a)-*(So.xo.<>o) 

»®o) 
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<  lim  Eii[A{9,x,a,u)^\-I(x,xo))]  +  EA2K,K^I{x,xo 

(9,Xt(i)^\voiXOfao)  ^ 


<  Eg 

=  0, 


lim 

(8,x,a)—{So,xo,ao) 


A{6,x,a,u){l  -  I{x,xo)) 


+  lim  2K.K‘2iK^\x  -  xo| 

(0,x,a)->^(0Ot^o  »«o) 


where  the  dominated  convergence  theorem  has  been  used  to  pass  the  limit  inside  the  expectation. 
This  proves  the  required  weak  continuity.  This  also  implies  (as  a  special  case)  that  J?s,x,a[V'*]  is 
continuous  in  {9,x,a),  which  verifies  the  second  requirement  of  Wl,  with  c  =  0. 

For  fixed  0  €  Q,  since  the  system  is  stable,  {fn. «  ^  1}  is  regenerative  and  is  a  Markov  chain  with 
some  steady-state  distribution  (see  Asmussen  1987,  chapter  VIII).  Regeneration  occurs  when¬ 
ever  an  iteration  starts  with  an  empty  system.  iFrom  Proposition  20,  sup„>i 
and  sup„>i  Ao[a5^]  <  K'^.  This  yields  W3.  By  similar  arguments  as  in  the  proof  of  Propo¬ 
sition  8,  one  can  show  that  sup„>i  <  Kh-  Take  K  =  max(A'/,,  A'j^).  For  any  c  > 

0,  one  has  if  >  Eo[{rjtn?\  >~ {ZK lt)P[{rjtn?  >  SA'/c],  so  that  sup„>,  P[(^;/t„)2  > 
SA'/e]  <  c/3.  Similarly,  sup„>i  >  3A/C]  <  c/3  and  sup„> j  P[a*  >  3A/c]  <  c/3.  Then, 
sup„>i  P[max((V’^/t„)^,x^^i,a*^j)  <  3A/c]  >  1  —  c.  This  reasoning  also  holds  for  9  varying  in 
any  manner  inside  0.  This  implies  the  tightness  properties  required  in  Wl. 

For  W2,  let  C  be  a  compact  subset  of  IR  x  5,  c  <  oo  such  that  C  C  [0,  c)^,  and  let  6  C.  Let 
i  denote  the  i-th  customer  overall  and  nt  -f  1  -I-  r*  be  the  index  of  the  first  non-waiting  customer 
from  the  beginning  of  iteration  n  -f  1.  One  has  t*  =  0  if  iteration  n  -i-  1  starts  with  a  new  busy 
cycle  and  otherwise,  r*  is  the  number  of  customers,  from  the  beginning  of  iteration  n  -f  1,  who 
are  in  the  same  busy  cycle  as  the  last  customer  of  iteration  n.  ^From  the  same  argument  as  in 
the  proof  of  Proposition  8,  there  exists  At(c)  <  co  such  that  Es=e[{‘ri)^]  <  Kric).  Then,  from 
straightforward  stochastic  ordering,  £[(r*)^  |  ^„]  <  P,=sc[(’'i  )*]  ^  A't(c).  This  implies  that  for  all 
c  >  0,  P[r;  >  At(c)/c  I  C„]  <  c.  Let  c  >  0,  n*(c)  =  \Kr{c)/('\,  c~=  (3A/c)V2,  ^  ^  Let 

Tic  =  1  +  ["“(c)/^]  and  I  >  Tie.  For  each  0  <  j  <  tic,  from  the  same  argument  as  we  used  above  to 
prove  Wl,  one  has  P[^n-K  €  C  |  r*  =  j]  >  1  -  c.  Then, 

n-(c) 

FK„+i€CU„]  >  r;  =  jU„] 

J=0 

n*(c) 

j=0 

>  (l-c)P[r:<Ti-(c)U„]  >  (1-c)*. 

Here,  P  denotes  the  probability  law  associated  with  the  Markov  chain  {(„,  ti  >  1}  when  9  varies 
according  to  the  algorithm  and  tIc  can  be  interpreted  as  a  time  that  we  give  to  the  system  to 
stabilize.  Roughly,  if  c  is  larger,  the  initial  state  could  be  larger  (e.g.  large  initial  queue  size),  and 
we  will  take  a  larger  tIc.  This  implies  W2. 

When  9  is  fixed,  from  Proposition  9,  6i  is  an  unbiased  estimator  of  the  derivative  of  the  expected 
system  time  of  the  i-th  customer  (overall).  Then,  x/;*  is  unbiased  for  the  gradient  of  the  expected 
total  system  time  of  customers  nt, . .  .,(ti  -|-  l)t  —  1.  When  n  — ♦  oo,  from  (13),  the  expectation  of 
^*/t„  -|-  C'{9)  thus  converges  to  a' (9).  Therefore,  v{9)  =  a'(S)  and  W5  follows*.  ■ 

’The  work  of  the  first  author  was  supported  by  NSERC-Canada  grant  no.  OGPOl  10050  and  FCAR-Qa4bec  grant 
no.  EQ2831.  The  second  author’s  work  was  supported  by  the  IBM  Corporation  under  SUR-SST  Contract  12480042 
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